Note: symdur only contains info on the duration of symptoms prior to admission, so all these analyses may be useless. Seems like much of the total duration of symptoms data is missing.

Notes on data formatting from Deborah Wentworth:

Notes on duration of symptoms: Currently using symdur here, because I want to do this analysis on the continuous duration of symptoms. But I’m waiting for an email from Deb clarifying whether or not this variable represents total duration of symptoms or just duration prior to enrollment.

Visualize the relationship between age and duration of symptoms

## 
##  Spearman's rank correlation rho
## 
## data:  valid$age and valid$symdur
## S = 66961000, p-value = 0.05384
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##        rho 
## 0.07015608

There is a trend toward a very weak positive correlation here, but looking at the data shows that it’s not very strong.

Visualize the relationship between age and prob of H3N2 infection

## 
##  Spearman's rank correlation rho
## 
## data:  valid$age and valid$symdur
## S = 63158000, p-value = 1.29e-05
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##       rho 
## 0.1568748

Results suggest a weak, positive correlation, although this may be influenced by outliers.

Visualize the relationship between country and symptom duration

H1N1

H3N2

No clear effect of country

Visualize the relationship between season and symptom duration

H1N1

No clear effect of season.

H3N2

No clear effect of season –> Try models that don’t include season and country as blocking vars

H1N1 10-fold CV

We know the relationship between age and probability of infection should be non-linear. Use cross-validation to verify, and to choose the number of degrees of freedom to include in a spline

Plot the overall test error rate vs. df

Clear preference for 2 df

H3N2 10-fold CV

We know the relationship between age and probability of infection should be non-linear. Use cross-validation to verify, and to choose the number of degrees of freedom to include in a spline

Plot the overall test error rate vs. df

Conclusion: Clear preference for linear fit for H3N2

Set up a logistic regression that includes the following predictors:

  • age
  • imprinting status
  • vaccination status
  • antiviral use
  • season
  • country

Clean data for fitting

## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## 
## Argentina Australia   Austria   Belgium     Chile     China   Denmark 
##        50        69         3        11         1        15        68 
##   Germany    Greece    Norway      Peru    Poland Singapore     Spain 
##        41        83         7        16         6         7        40 
##  Thailand        UK       USA 
##        64       127       110
## [1] 718

First fit a model where the only independent variable is age

Treat vaccination, antivial use, underlying symptoms, country and season as blocking variables with fixed effects

library(splines)
## H1N1 fit to medical history only (no effect of country, season, age or imprinting)
fit00 = glm(symdur ~ anyvac + anyav + anydx, data = train.H1N1)
summary(fit00)
## 
## Call:
## glm(formula = symdur ~ anyvac + anyav + anydx, data = train.H1N1)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -6.224  -2.967  -1.224   1.301  42.509  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   5.4424     0.5525   9.851   <2e-16 ***
## anyvac1       0.2678     0.4503   0.595    0.552    
## anyav1        0.2566     0.5228   0.491    0.624    
## anydx1        0.5246     0.4483   1.170    0.242    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 24.04477)
## 
##     Null deviance: 15523  on 646  degrees of freedom
## Residual deviance: 15461  on 643  degrees of freedom
## AIC: 3899.5
## 
## Number of Fisher Scoring iterations: 2
## H1N1 fit to country, season and medical history only (no effect of age or imprinting)  
fit0 = glm(symdur ~ anyvac + anyav + anydx + country + season, data = train.H1N1)
summary(fit0)
## 
## Call:
## glm(formula = symdur ~ anyvac + anyav + anydx + country + season, 
##     data = train.H1N1)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -7.054  -2.854  -1.114   1.535  38.363  
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        6.2162     1.4141   4.396  1.3e-05 ***
## anyvac1           -0.2739     0.4812  -0.569  0.56940    
## anyav1             0.6877     0.5410   1.271  0.20416    
## anydx1             0.1985     0.4645   0.427  0.66923    
## countryAustralia   0.4665     1.2092   0.386  0.69978    
## countryBelgium     0.6131     2.0853   0.294  0.76884    
## countryDenmark    -0.7396     1.3440  -0.550  0.58233    
## countryGermany    -0.5488     1.4602  -0.376  0.70714    
## countryGreece     -2.0609     1.3171  -1.565  0.11816    
## countryOther      -1.6289     1.3276  -1.227  0.22030    
## countrySpain      -0.3979     1.4426  -0.276  0.78278    
## countryThailand   -3.4437     1.1906  -2.892  0.00396 ** 
## countryUK         -0.3995     1.2637  -0.316  0.75198    
## countryUSA        -0.8314     1.2854  -0.647  0.51800    
## seasonNH.10.11     0.6383     0.7514   0.849  0.39600    
## seasonNH.11.12     0.6758     1.9688   0.343  0.73155    
## seasonNH.12.13     0.8007     1.5247   0.525  0.59964    
## seasonNH.13.14     0.7895     0.7756   1.018  0.30910    
## seasonNH.14.15     1.5225     1.5045   1.012  0.31193    
## seasonNH.15.16     0.3512     0.6779   0.518  0.60462    
## seasonNH.16.17    -0.3809     1.7783  -0.214  0.83047    
## seasonSH.10       -0.1925     1.5268  -0.126  0.89970    
## seasonSH.11       -1.7149     1.5618  -1.098  0.27260    
## seasonSH.12        0.8042     2.3103   0.348  0.72788    
## seasonSH.13        0.7138     1.4782   0.483  0.62937    
## seasonSH.14        3.3423     1.8262   1.830  0.06771 .  
## seasonSH.15       -1.7138     1.9668  -0.871  0.38391    
## seasonSH.16       -1.1813     1.0953  -1.078  0.28124    
## seasonSH.17        0.8921     2.3010   0.388  0.69838    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 23.56294)
## 
##     Null deviance: 15523  on 646  degrees of freedom
## Residual deviance: 14562  on 618  degrees of freedom
## AIC: 3910.7
## 
## Number of Fisher Scoring iterations: 2
## H1N1 fit to age only, plus fixed effects of vaccination, av use, underlying conditions, country and season
fit1 = glm(symdur ~ ns(age, knots = 65) + anyvac + anyav + anydx + country + season, data = train.H1N1)
summary(fit1)
## 
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + anyvac + anyav + 
##     anydx + country + season, data = train.H1N1)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -7.477  -2.842  -1.052   1.561  38.175  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           4.72824    1.50040   3.151  0.00170 **
## ns(age, knots = 65)1  3.69432    1.42018   2.601  0.00951 **
## ns(age, knots = 65)2 -1.09713    1.19831  -0.916  0.36025   
## anyvac1              -0.32593    0.49096  -0.664  0.50702   
## anyav1                0.64715    0.53848   1.202  0.22990   
## anydx1               -0.09236    0.48236  -0.191  0.84821   
## countryAustralia      0.42180    1.21037   0.348  0.72759   
## countryBelgium        0.71160    2.08309   0.342  0.73276   
## countryDenmark       -0.64419    1.34493  -0.479  0.63213   
## countryGermany       -0.18651    1.46318  -0.127  0.89861   
## countryGreece        -2.16562    1.31107  -1.652  0.09908 . 
## countryOther         -1.64659    1.32349  -1.244  0.21392   
## countrySpain         -0.47702    1.43761  -0.332  0.74014   
## countryThailand      -3.34321    1.18597  -2.819  0.00497 **
## countryUK            -0.25376    1.26653  -0.200  0.84127   
## countryUSA           -0.82410    1.28460  -0.642  0.52142   
## seasonNH.10.11        0.55064    0.75111   0.733  0.46378   
## seasonNH.11.12        1.25181    1.96937   0.636  0.52525   
## seasonNH.12.13        0.46299    1.52508   0.304  0.76155   
## seasonNH.13.14        0.68947    0.77270   0.892  0.37259   
## seasonNH.14.15        1.55074    1.51573   1.023  0.30666   
## seasonNH.15.16        0.21181    0.68527   0.309  0.75736   
## seasonNH.16.17       -0.39135    1.77006  -0.221  0.82509   
## seasonSH.10           0.13957    1.52445   0.092  0.92708   
## seasonSH.11          -1.42473    1.55733  -0.915  0.36063   
## seasonSH.12           1.41702    2.30926   0.614  0.53969   
## seasonSH.13           1.08378    1.47666   0.734  0.46326   
## seasonSH.14           3.37194    1.82386   1.849  0.06497 . 
## seasonSH.15          -2.08363    1.96142  -1.062  0.28851   
## seasonSH.16          -1.19781    1.09389  -1.095  0.27394   
## seasonSH.17           0.53899    2.29319   0.235  0.81426   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 23.32775)
## 
##     Null deviance: 15523  on 646  degrees of freedom
## Residual deviance: 14370  on 616  degrees of freedom
## AIC: 3906.2
## 
## Number of Fisher Scoring iterations: 2
## H1N1 fit to age plus imprinting, plus fixed effects of vaccination, av use, underlying conditions, country and season
fit2 = glm(symdur ~ ns(age, knots = 65) + p.g1.protection + anyvac + anyav + anydx + country + season, data = train.H1N1)
summary(fit2)
## 
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + p.g1.protection + 
##     anyvac + anyav + anydx + country + season, data = train.H1N1)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -7.374  -2.791  -1.075   1.618  38.245  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           4.70242    1.50078   3.133  0.00181 **
## ns(age, knots = 65)1  5.05757    2.02296   2.500  0.01268 * 
## ns(age, knots = 65)2 -0.46876    1.37006  -0.342  0.73236   
## p.g1.protection      -0.74556    0.78783  -0.946  0.34434   
## anyvac1              -0.32310    0.49101  -0.658  0.51077   
## anyav1                0.61757    0.53943   1.145  0.25271   
## anydx1               -0.07143    0.48290  -0.148  0.88246   
## countryAustralia      0.45781    1.21107   0.378  0.70554   
## countryBelgium        0.82620    2.08679   0.396  0.69230   
## countryDenmark       -0.61053    1.34552  -0.454  0.65017   
## countryGermany       -0.14568    1.46394  -0.100  0.92076   
## countryGreece        -2.08020    1.31428  -1.583  0.11399   
## countryOther         -1.57598    1.32570  -1.189  0.23498   
## countrySpain         -0.47964    1.43773  -0.334  0.73879   
## countryThailand      -3.31511    1.18644  -2.794  0.00537 **
## countryUK            -0.16912    1.26980  -0.133  0.89409   
## countryUSA           -0.73398    1.28823  -0.570  0.56905   
## seasonNH.10.11        0.53954    0.75127   0.718  0.47292   
## seasonNH.11.12        1.35576    1.97260   0.687  0.49215   
## seasonNH.12.13        0.38859    1.52724   0.254  0.79924   
## seasonNH.13.14        0.61820    0.77643   0.796  0.42622   
## seasonNH.14.15        1.44206    1.52020   0.949  0.34320   
## seasonNH.15.16        0.10177    0.69512   0.146  0.88365   
## seasonNH.16.17       -0.36119    1.77050  -0.204  0.83842   
## seasonSH.10           0.18536    1.52535   0.122  0.90332   
## seasonSH.11          -1.47592    1.55840  -0.947  0.34397   
## seasonSH.12           1.47016    2.31014   0.636  0.52476   
## seasonSH.13           1.07195    1.47683   0.726  0.46821   
## seasonSH.14           3.38689    1.82408   1.857  0.06382 . 
## seasonSH.15          -2.04971    1.96191  -1.045  0.29655   
## seasonSH.16          -1.21520    1.09413  -1.111  0.26715   
## seasonSH.17           0.51003    2.29359   0.222  0.82410   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 23.33171)
## 
##     Null deviance: 15523  on 646  degrees of freedom
## Residual deviance: 14349  on 615  degrees of freedom
## AIC: 3907.2
## 
## Number of Fisher Scoring iterations: 2
## H1N1 fit to age only, plus fixed effects of vaccination, av use, underlying conditions NO COUNTRY AND SEASON
fit3 = glm(symdur ~ ns(age, knots = 65) + anyvac + anyav + anydx, data = train.H1N1)
summary(fit3)
## 
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + anyvac + anyav + 
##     anydx, data = train.H1N1)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -6.516  -2.903  -1.217   1.358  42.092  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.1430     0.7317   5.662 2.26e-08 ***
## ns(age, knots = 65)1   2.8045     1.3238   2.118   0.0345 *  
## ns(age, knots = 65)2  -1.9740     1.1414  -1.729   0.0842 .  
## anyvac1                0.2967     0.4550   0.652   0.5145    
## anyav1                 0.2028     0.5214   0.389   0.6975    
## anydx1                 0.3367     0.4635   0.727   0.4678    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 23.82754)
## 
##     Null deviance: 15523  on 646  degrees of freedom
## Residual deviance: 15273  on 641  degrees of freedom
## AIC: 3895.6
## 
## Number of Fisher Scoring iterations: 2
## H1N1 fit to age plus imprinting, plus fixed effects of vaccination, av use, underlying conditions, NO COUNTRY AND SEASON
fit4 = glm(symdur ~ ns(age, knots = 65) + p.g1.protection + anyvac + anyav + anydx, data = train.H1N1)
summary(fit4)
## 
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + p.g1.protection + 
##     anyvac + anyav + anydx, data = train.H1N1)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -6.442  -2.857  -1.189   1.325  42.195  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            4.1516     0.7318   5.673 2.12e-08 ***
## ns(age, knots = 65)1   4.1425     1.8885   2.194   0.0286 *  
## ns(age, knots = 65)2  -1.3668     1.2947  -1.056   0.2915    
## p.g1.protection       -0.7624     0.7674  -0.993   0.3208    
## anyvac1                0.3041     0.4550   0.668   0.5042    
## anyav1                 0.1625     0.5230   0.311   0.7561    
## anydx1                 0.3669     0.4645   0.790   0.4298    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 23.82802)
## 
##     Null deviance: 15523  on 646  degrees of freedom
## Residual deviance: 15250  on 640  degrees of freedom
## AIC: 3896.6
## 
## Number of Fisher Scoring iterations: 2
## Plot the fits for each model side by side
par(mfrow = c(1, 2), las = 3, mar = c(6, 6, 6, 3), cex.axis = .7)
library(gam)
## Loading required package: foreach
## Loaded gam 1.14-4
plot.gam(fit00, se = T, col = 'dodgerblue', main = 'Simplified Baseline')

plot.gam(fit0, se = T, col = 'dodgerblue', main = 'Baseline')

plot.gam(fit1, se = T, col = 'dodgerblue', main = 'Baseline + age + imp')

plot.gam(fit2, se = T, col = 'dodgerblue', main = 'Baseline + age + imp')

plot.gam(fit3, se = T, col = 'dodgerblue', main = 'Simple baseline + age + imp')

plot.gam(fit4, se = T, col = 'dodgerblue', main = 'Simple baseline + age + imp')

pdf('003Symdur_H1N1.pdf')
plot.gam(fit4, se = T, col = 'dodgerblue', main = 'Simple baseline + age + imp')
dev.off()
## quartz_off_screen 
##                 2
# Predict probs for each patient
pred00 = predict(fit00, newdata = test.H1N1)
pred0 = predict(fit0, newdata = test.H1N1)
pred1 = predict(fit1, newdata = test.H1N1)
pred2 = predict(fit2, newdata = test.H1N1)
pred3 = predict(fit3, newdata = test.H1N1)
pred4 = predict(fit4, newdata = test.H1N1)
  
# Estimate the simulated error rate
MSE = numeric(6); names(MSE) = c('simple.baseline', 'baseline', 'baeline+age', 'baseline+age+imp', 'simple.baseline+age', 'simple.baseline+age+imp')
MSE[1] = mean((pred00 - test.H1N1$symdur)^2)
MSE[2] = mean((pred0 - test.H1N1$symdur)^2)
MSE[3] = mean((pred1 - test.H1N1$symdur)^2)
MSE[4] = mean((pred2 - test.H1N1$symdur)^2)
MSE[5] = mean((pred3 - test.H1N1$symdur)^2)
MSE[6] = mean((pred4 - test.H1N1$symdur)^2)
sort(MSE)
## simple.baseline+age+imp     simple.baseline+age        baseline+age+imp 
##                73.82638                74.18532                75.63663 
##         simple.baseline             baeline+age                baseline 
##                75.66241                76.21245                78.30160
AIC = c(fit00$aic, fit0$aic, fit1$aic, fit2$aic, fit3$aic, fit4$aic); names(AIC) =  c('simple.baseline', 'baseline', 'baeline+age', 'baseline+age+imp', 'simple.baseline+age', 'simple.baseline+age+imp')
del.AIC = sort(AIC - min(AIC))
del.AIC
##     simple.baseline+age simple.baseline+age+imp         simple.baseline 
##                0.000000                1.002958                3.887447 
##             baeline+age        baseline+age+imp                baseline 
##               10.545393               11.603895               15.133104
load('master.AIC.table.H1N1.RData')
master.AIC.table['003Symdur',  c('simple.baseline', 'baseline', 'baseline+age', 'baseline+age+imp', 'simple.baseline+age', 'simple.baseline+age+imp')] = AIC - min(AIC)
save(master.AIC.table, file = 'master.AIC.table.H1N1.RData')
rm(master.AIC.table)

Clean H3N2 data for fitting

## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## integer(0)
## 
## Argentina Australia   Belgium     China   Denmark   Germany    Greece 
##       111       122         3         7        26        16        24 
##      Peru Singapore     Spain  Thailand        UK       USA 
##         9        24        28       125        84       168
## [1] 747
library(splines)
## H3N2 fit to medical history only (no effect of country, season, age or imprinting)
fit00 = glm(symdur ~ anyvac + anyav + anydx, data = train.H3N2)
summary(fit00)
## 
## Call:
## glm(formula = symdur ~ anyvac + anyav + anydx, data = train.H3N2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
##  -7.832   -2.998   -1.205    0.795  142.495  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   7.5050     1.0099   7.432 3.29e-13 ***
## anyvac1       1.2075     0.6108   1.977  0.04845 *  
## anyav1       -2.6264     0.7837  -3.351  0.00085 ***
## anydx1        0.1191     0.8515   0.140  0.88879    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 58.11606)
## 
##     Null deviance: 39766  on 672  degrees of freedom
## Residual deviance: 38880  on 669  degrees of freedom
## AIC: 4649.9
## 
## Number of Fisher Scoring iterations: 2
## H3N2 fit to country, season and medical history only (no effect of age or imprinting)  
fit0 = glm(symdur ~ anyvac + anyav + anydx + country + season, data = train.H3N2)
summary(fit0)
## 
## Call:
## glm(formula = symdur ~ anyvac + anyav + anydx + country + season, 
##     data = train.H3N2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -10.972   -2.873   -0.964    1.036  133.189  
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       5.23197    3.08747   1.695   0.0906 .  
## anyvac1           0.70120    0.69945   1.003   0.3165    
## anyav1           -1.88909    0.82156  -2.299   0.0218 *  
## anydx1            0.04999    0.88978   0.056   0.9552    
## countryAustralia  0.44214    1.32185   0.334   0.7381    
## countryBelgium   -1.83483    4.69204  -0.391   0.6959    
## countryDenmark    8.97077    2.14544   4.181  3.3e-05 ***
## countryGreece    -1.21917    2.26788  -0.538   0.5911    
## countryOther      1.24425    2.01647   0.617   0.5374    
## countrySingapore -1.46714    1.83897  -0.798   0.4253    
## countrySpain      0.58980    2.16634   0.272   0.7855    
## countryThailand  -1.49057    1.41048  -1.057   0.2910    
## countryUK         0.38486    1.72155   0.224   0.8232    
## countryUSA       -0.23631    1.63350  -0.145   0.8850    
## seasonNH.11.12    2.50392    2.96734   0.844   0.3991    
## seasonNH.12.13    2.55262    2.58711   0.987   0.3242    
## seasonNH.13.14    1.56102    2.88256   0.542   0.5883    
## seasonNH.14.15    2.60802    2.55482   1.021   0.3077    
## seasonNH.15.16    2.07245    3.04396   0.681   0.4962    
## seasonNH.16.17    0.80441    2.57899   0.312   0.7552    
## seasonSH.10      -0.91160    4.56512  -0.200   0.8418    
## seasonSH.11       0.19072    3.53065   0.054   0.9569    
## seasonSH.12       1.63826    3.11286   0.526   0.5989    
## seasonSH.13      -0.62362    3.47057  -0.180   0.8575    
## seasonSH.14       2.08340    2.91735   0.714   0.4754    
## seasonSH.15       1.51763    2.84487   0.533   0.5939    
## seasonSH.16       1.72769    2.84630   0.607   0.5441    
## seasonSH.17       2.03845    2.89072   0.705   0.4810    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 56.40278)
## 
##     Null deviance: 39766  on 672  degrees of freedom
## Residual deviance: 36380  on 645  degrees of freedom
## AIC: 4653.2
## 
## Number of Fisher Scoring iterations: 2
## H3N2 fit to age only, plus fixed effects of vaccination, av use, underlying conditions, country and season
fit1 = glm(symdur ~ ns(age, knots = 65) + anyvac + anyav + anydx + country + season, data = train.H3N2)
summary(fit1)
## 
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + anyvac + anyav + 
##     anydx + country + season, data = train.H3N2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -11.127   -2.866   -1.021    0.989  132.853  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            5.4989     3.1558   1.742   0.0819 .  
## ns(age, knots = 65)1  -1.1374     2.2680  -0.502   0.6162    
## ns(age, knots = 65)2   0.9499     1.2462   0.762   0.4462    
## anyvac1                0.6418     0.7129   0.900   0.3683    
## anyav1                -1.8476     0.8236  -2.243   0.0252 *  
## anydx1                 0.1335     0.9460   0.141   0.8878    
## countryAustralia       0.5395     1.3293   0.406   0.6850    
## countryBelgium        -1.7323     4.6976  -0.369   0.7124    
## countryDenmark         9.0824     2.1525   4.220  2.8e-05 ***
## countryGreece         -1.1533     2.2712  -0.508   0.6118    
## countryOther           1.2988     2.0194   0.643   0.5203    
## countrySingapore      -1.2189     1.8646  -0.654   0.5135    
## countrySpain           0.6136     2.1689   0.283   0.7773    
## countryThailand       -1.4218     1.4138  -1.006   0.3150    
## countryUK              0.4959     1.7364   0.286   0.7753    
## countryUSA            -0.1030     1.6534  -0.062   0.9504    
## seasonNH.11.12         2.6257     2.9756   0.882   0.3779    
## seasonNH.12.13         2.6569     2.5962   1.023   0.3065    
## seasonNH.13.14         1.6663     2.8894   0.577   0.5644    
## seasonNH.14.15         2.7299     2.5615   1.066   0.2870    
## seasonNH.15.16         2.0174     3.0489   0.662   0.5084    
## seasonNH.16.17         0.9119     2.5845   0.353   0.7243    
## seasonSH.10           -1.0940     4.5814  -0.239   0.8113    
## seasonSH.11            0.3042     3.5368   0.086   0.9315    
## seasonSH.12            1.7526     3.1206   0.562   0.5746    
## seasonSH.13           -0.4184     3.4838  -0.120   0.9044    
## seasonSH.14            2.1878     2.9222   0.749   0.4543    
## seasonSH.15            1.5589     2.8480   0.547   0.5843    
## seasonSH.16            1.7725     2.8510   0.622   0.5343    
## seasonSH.17            2.1312     2.8961   0.736   0.4621    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 56.50456)
## 
##     Null deviance: 39766  on 672  degrees of freedom
## Residual deviance: 36332  on 643  degrees of freedom
## AIC: 4656.3
## 
## Number of Fisher Scoring iterations: 2
## H3N2 fit to age plus imprinting, plus fixed effects of vaccination, av use, underlying conditions, country and season
fit2 = glm(symdur ~ ns(age, knots = 65) + p.g2.protection + anyvac + anyav + anydx + country + season, data = train.H3N2)
summary(fit2)
## 
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + p.g2.protection + 
##     anyvac + anyav + anydx + country + season, data = train.H3N2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -11.248   -2.861   -1.054    1.129  132.569  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           8.17332    3.50635   2.331   0.0201 *  
## ns(age, knots = 65)1 -6.05825    3.62474  -1.671   0.0951 .  
## ns(age, knots = 65)2 -0.54897    1.51375  -0.363   0.7170    
## p.g2.protection      -2.88270    1.65809  -1.739   0.0826 .  
## anyvac1               0.59638    0.71223   0.837   0.4027    
## anyav1               -1.92775    0.82359  -2.341   0.0196 *  
## anydx1                0.16448    0.94467   0.174   0.8618    
## countryAustralia      0.59643    1.32758   0.449   0.6534    
## countryBelgium       -1.16583    4.70157  -0.248   0.8042    
## countryDenmark        9.19426    2.15005   4.276 2.19e-05 ***
## countryGreece        -1.17415    2.26763  -0.518   0.6048    
## countryOther          1.21842    2.01673   0.604   0.5460    
## countrySingapore     -1.34055    1.86299  -0.720   0.4721    
## countrySpain          0.59166    2.16556   0.273   0.7848    
## countryThailand      -1.49361    1.41214  -1.058   0.2906    
## countryUK             0.48342    1.73366   0.279   0.7805    
## countryUSA           -0.04448    1.65120  -0.027   0.9785    
## seasonNH.11.12        2.75155    2.97183   0.926   0.3549    
## seasonNH.12.13        2.81201    2.59362   1.084   0.2787    
## seasonNH.13.14        1.84509    2.88667   0.639   0.5229    
## seasonNH.14.15        2.99777    2.56216   1.170   0.2424    
## seasonNH.15.16        2.27336    3.04763   0.746   0.4560    
## seasonNH.16.17        1.25487    2.58802   0.485   0.6279    
## seasonSH.10          -0.78542    4.57764  -0.172   0.8638    
## seasonSH.11           0.74457    3.54034   0.210   0.8335    
## seasonSH.12           1.74835    3.11574   0.561   0.5749    
## seasonSH.13          -0.25317    3.47966  -0.073   0.9420    
## seasonSH.14           2.22679    2.91770   0.763   0.4456    
## seasonSH.15           1.64230    2.84395   0.577   0.5638    
## seasonSH.16           1.98142    2.84903   0.695   0.4870    
## seasonSH.17           2.40937    2.89598   0.832   0.4057    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 56.32738)
## 
##     Null deviance: 39766  on 672  degrees of freedom
## Residual deviance: 36162  on 642  degrees of freedom
## AIC: 4655.1
## 
## Number of Fisher Scoring iterations: 2
## H3N2 fit to age only, plus fixed effects of vaccination, av use, underlying conditions NO COUNTRY AND SEASON
fit3 = glm(symdur ~ ns(age, knots = 65) + anyvac + anyav + anydx, data = train.H3N2)
summary(fit3)
## 
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + anyvac + anyav + 
##     anydx, data = train.H3N2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
##  -8.176   -3.000   -1.380    0.797  142.251  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            7.8797     1.2618   6.245 7.56e-10 ***
## ns(age, knots = 65)1  -1.0588     2.2634  -0.468  0.64009    
## ns(age, knots = 65)2   0.5504     1.2005   0.458  0.64677    
## anyvac1                1.1954     0.6158   1.941  0.05268 .  
## anyav1                -2.5994     0.7857  -3.308  0.00099 ***
## anydx1                 0.2320     0.9041   0.257  0.79751    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 58.25312)
## 
##     Null deviance: 39766  on 672  degrees of freedom
## Residual deviance: 38855  on 667  degrees of freedom
## AIC: 4653.5
## 
## Number of Fisher Scoring iterations: 2
## H3N2 fit to age plus imprinting, plus fixed effects of vaccination, av use, underlying conditions, NO COUNTRY AND SEASON
fit4 = glm(symdur ~ ns(age, knots = 65) + p.g2.protection + anyvac + anyav + anydx, data = train.H3N2)
summary(fit4)
## 
## Call:
## glm(formula = symdur ~ ns(age, knots = 65) + p.g2.protection + 
##     anyvac + anyav + anydx, data = train.H3N2)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
##  -8.326   -3.052   -1.301    0.813  142.187  
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           10.0375     2.0593   4.874 1.37e-06 ***
## ns(age, knots = 65)1  -4.7924     3.6128  -1.327 0.185124    
## ns(age, knots = 65)2  -0.6109     1.4857  -0.411 0.681062    
## p.g2.protection       -2.1789     1.6439  -1.325 0.185486    
## anyvac1                1.1802     0.6156   1.917 0.055646 .  
## anyav1                -2.6417     0.7859  -3.361 0.000821 ***
## anydx1                 0.2690     0.9040   0.298 0.766161    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 58.1871)
## 
##     Null deviance: 39766  on 672  degrees of freedom
## Residual deviance: 38753  on 666  degrees of freedom
## AIC: 4653.7
## 
## Number of Fisher Scoring iterations: 2
## Plot the fits for each model side by side
par(mfrow = c(1, 2), las = 3, mar = c(6, 6, 6, 3), cex.axis = .7)
library(gam)
plot.gam(fit00, se = T, col = 'firebrick1', main = 'Simplified Baseline')

plot.gam(fit0, se = T, col = 'firebrick1', main = 'Baseline')

plot.gam(fit1, se = T, col = 'firebrick1', main = 'Baseline + age + imp')

plot.gam(fit2, se = T, col = 'firebrick1', main = 'Baseline + age + imp')

plot.gam(fit3, se = T, col = 'firebrick1', main = 'Simple baseline + age + imp')

plot.gam(fit4, se = T, col = 'firebrick1', main = 'Simple baseline + age + imp')

pdf('003Symdur_H3N2.pdf')
plot.gam(fit4, se = T, col = 'firebrick1', main = 'Simple baseline + age + imp')
dev.off()
## quartz_off_screen 
##                 2
# Predict probs for each patient
pred00 = predict(fit00, newdata = test.H3N2)
pred0 = predict(fit0, newdata = test.H3N2)
pred1 = predict(fit1, newdata = test.H3N2)
pred2 = predict(fit2, newdata = test.H3N2)
pred3 = predict(fit3, newdata = test.H3N2)
pred4 = predict(fit4, newdata = test.H3N2)
  
# Estimate the simulated error rate
MSE = numeric(6); names(MSE) = c('simple.baseline', 'baseline', 'baeline+age', 'baseline+age+imp', 'simple.baseline+age', 'simple.baseline+age+imp')
MSE[1] = mean((pred00 - test.H3N2$symdur)^2)
MSE[2] = mean((pred0 - test.H3N2$symdur)^2)
MSE[3] = mean((pred1 - test.H3N2$symdur)^2)
MSE[4] = mean((pred2 - test.H3N2$symdur)^2)
MSE[5] = mean((pred3 - test.H3N2$symdur)^2)
MSE[6] = mean((pred4 - test.H3N2$symdur)^2)
sort(MSE)
##         simple.baseline     simple.baseline+age simple.baseline+age+imp 
##                20.47214                20.85406                20.88790 
##                baseline             baeline+age        baseline+age+imp 
##                22.51058                22.85079                23.08748
AIC = c(fit00$aic, fit0$aic, fit1$aic, fit2$aic, fit3$aic, fit4$aic); names(AIC) =  c('simple.baseline', 'baseline', 'baseline+age', 'baseline+age+imp', 'simple.baseline+age', 'simple.baseline+age+imp')
del.AIC = sort(AIC - min(AIC))
del.AIC
##         simple.baseline                baseline     simple.baseline+age 
##                0.000000                3.274308                3.570356 
## simple.baseline+age+imp        baseline+age+imp            baseline+age 
##                3.797487                5.236422                6.397545
load('master.AIC.table.H3N2.RData')
master.AIC.table['003Symdur',   c('simple.baseline', 'baseline', 'baseline+age', 'baseline+age+imp', 'simple.baseline+age', 'simple.baseline+age+imp')] = AIC - min(AIC)
save(master.AIC.table, file = 'master.AIC.table.H3N2.RData')
rm(master.AIC.table)